Teaching Bandits How to Behave Manuscript

نویسندگان

Yiling Chen

Jerry Kung

David Parkes

Ariel Procaccia

Haoqi Zhang

چکیده

Consider a setting in which an agent selects an action in each time period and there is an interested party who seeks to induce a particular action. The interested party can associate incentives with actions to perturb their value to the agent. The agent’s decision problem is modeled as a multi-armed bandit process where the intrinsic value for an action updates independently of the state of other actions and only when the action is selected. The agent selects the action in each period with the maximal perturbed value. In particular, this models the problem of a learning agent with the interested party as a teacher. For inducing the goal action as soon as possible, or as often as possible over a fixed time period, it is optimal for an interested party with a per-period incentive budget to assign the budget to the goal action and wait for the agent to learn to want to make that choice. Teaching is easy in this case. In contrast, with an across-period budget, no algorithm can provide good performance on all instances without knowledge of the agent’s update process, except in the particular case in which the goal is to induce the agent to select the goal action

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Survey relationship between clinical faculties' manner or teaching behaviors and ‎nursing students' anxiety from students' view point at Guilan University of ‎Medical Sciences, ‎‏2007‏

Introduction: Students' perceptions of how clinical faculty behave and relate to them (both &lrm;positively and negatively) were noted to influence their anxiety levels and consequently their &lrm;ability to learn and perform safely and effectively.&lrm;  Objective: This study aims to determine the relationship between clinical faculties' manner of &lrm;teaching behaviors and nursing st...

متن کامل

A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

We generalise classical multi-armed and restless bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch provided they do not consume more resource than is...

متن کامل

A Comparison of Automatic Teaching Strategies for Heterogeneous Student Populations

Online planning of good teaching sequences has the potential to provide a truly personalized teaching experience with a huge impact on the motivation and learning of students. In this work we compare two main approaches to achieve such a goal, POMDPs that can find an optimal long-term path, and Multi-armed bandits that optimize policies locally and greedily but that are computationally more eff...

متن کامل

Asymptotically optimal priority policies for indexable and non-indexable restless bandits

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit isa controllable stochastic process whose state evolution depends on whether or not the bandit ismade active. Since finding the optimal control is typically intractable, we propose a class of prioritypolicies that are proved to be asymptotically optimal under a global attractor property an...

متن کامل

Protection and Social Order

Consider a simple world populated with two types of individuals, those who work and create wealth (peasants) and those who steal the property of others (bandits). With bandits about, peasants need to protect their output and can do so individually or collectively. But either way protection is costly; it consumes resources and interferes with an individual’s ability to create wealth. This study ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Teaching Bandits How to Behave Manuscript

نویسندگان

چکیده

منابع مشابه

Survey relationship between clinical faculties' manner or teaching behaviors and ‎nursing students' anxiety from students' view point at Guilan University of ‎Medical Sciences, ‎‏2007‏

A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

A Comparison of Automatic Teaching Strategies for Heterogeneous Student Populations

Asymptotically optimal priority policies for indexable and non-indexable restless bandits

Protection and Social Order

عنوان ژورنال:

اشتراک گذاری